Integrative machine learning approach for multi-class SCOP protein fold classification
نویسندگان
چکیده
Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known ‘False Positives’ problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods.
منابع مشابه
Multi-class protein fold recognition using support vector machines and neural networks
MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...
متن کاملMulti-class Protein Fold Recognition Through a Symbolic-Statistical Framework
Protein fold recognition is an important problem in molecular biology. Machine learning symbolic approaches have been applied to automatically discover local structural signatures and relate these to the concept of fold in SCOP. However, most of these methods cannot handle uncertainty being therefore not able to solve multiple prediction problems. In this paper we present an application of the ...
متن کاملMulti-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as man...
متن کاملA novel ensemble of classifiers for protein fold recognition
Predicting the three-dimensional structure of a protein from its amino acid sequence is an important problem in bioinformatics and a challenging task for machine-learning algorithms. We propose a new ensemble of K-local hyperplane based on random subspace and feature selection, and tested it on a real-world dataset containing 27 SCOP folds from [C. Ding, I. Dubchak, Multi-class protein fold rec...
متن کاملProtein Fold Classification using Kohonen's Self-Organizing Map
Protein fold classification is an important problem in bioinformatics and a challenging task for machine-learning algorithms. In this paper we present a solution which classifies protein folds using Kohonen’s Self-Organizing Map (SOM) and a comparison between few approaches for protein fold classification. We use SOM, Fisher Linear Discriminant Analysis (FLD), K-Nearest Neighbour (KNN), Support...
متن کامل